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OBJECTIVES 

The  original  goals  of  this  basic  research  project  were  to  perform  experiments  to 
understand  how  attention  to  auditory  and  multimodal  objects  affects  performance  in 
demanding  settings,  especially  those  in  which  competition  for  attention  limits  human 
abilities.  Aim  1  was  to  test  the  effect  of  spatial  separation  of  sources  on  selective  and 
divided  auditory  attention,  including  whether  knowledge  of  spatial  configuration  affects 
performance.  Aim  2  was  to  test  whether  selective  attention  to  space  suppresses  responses 
to  objects  at  unattended  locations.  Aim  3  was  to  explore  the  similarities  between  attention 
to  spatial  location  and  other  features  of  an  auditory  object.  Aim  4  was  to  determine 
whether  traditional  models  of  spatial  hearing  can  account  for  the  effect  of 
spatial  auditory  attention. 

COMPARISON  BETWEEN  ACTUAL  ACCOMPLISHMENTS  AND  GOALS 

During  the  course  of  the  three-year  grant  period,  we  made  excellent  progress  on  all  four 
of  the  original  aims.  However,  early  in  the  grant  period,  we  realized  that  the  ability  of  a 
listener  to  attend  to  an  audio  or  audio-visual  object  depends  critically  on  how  a  mixture 
of  sounds  is  perceptually  organized  into  objects.  As  a  result,  we  redirected  some  of  our 
efforts  to  a  new  goal  (Aim  5),  investigating  the  more  basic  question  of  how  listeners 
parse  an  ambiguous  mixture  of  sound  coming  from  multiple  sound  sources  into 
perceptual  objects  to  which  they  can  attend.  Accomplishments  in  all  five  areas  are 
summarized  in  subsequent  sections  of  this  report. 

APROACH 

Behavioral  experiments  were  conducted  to  explore  the  ability  of  listeners  to  separate, 
understand,  and  identify  messages  from  competing  auditory  objects  (using  human  talkers, 
songbird  calls,  and  complex  harmonic  spectrotemporal  patterns).  Spatial  cues  in  the 
acoustic  signals  were  manipulated  to  explore  how  spatial  information  influences 
performance.  In  some  experiments,  multiple  loudspeakers  were  used  to  present  sounds 
from  different  locations.  In  other  experiments,  realistic  spatial  cues  were  simulated  using 
virtual  auditory  space  techniques.  In  auditory  mixtures,  sounds  add  before  reaching  the 
listeners  ears;  thus,  competing  sounds  interfere  with  one  another  at  the  most  peripheral 
representation  of  sound  in  the  auditory  neural  pathway.  To  control  for  such  peripheral 
effects  in  some  experiments,  competing  speech  signals  were  processed  to  reduce  spectral 
overlap  in  order  to  focus  on  the  effects  of  central  attentional  limitations.  In  selective 
attention  tasks,  listeners  reported  the  identity  or  content  of  one  source  in  the  mixture.  In 
divided  experiments,  listeners  reported  the  content  of  both  competing  sources.  In 
segregation  tasks,  perceptual  organization  of  the  sound  mixture  was  measured  indirectly, 
by  measuring  the  contributions  of  ambiguous  sound  elements  to  object  identity  and/or  to 
object  location. 
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ACCOMPLISHMENTS 


We  investigated  how  spatial  configuration  and  spatial  knowledge  affect  the  ability  to  hear 
out  and  understand  an  auditory  target  in  a  complex  environment.  Overall,  results  show 
that  the  role  of  spatial  cues  in  hearing  out  and  understanding  auditory  sources  depends  on 
the  complexity  of  the  listening  environment.  More  specifically,  spatial  configuration  of 
the  sources  in  an  environment  and  a  priori  knowledge  of  this  configuration  play  a 
prominent  role  in  helping  the  listener,  and  their  importance  increases  with  the  complexity 
of  the  environment  (i.e.,  increasing  with  the  number  and  similarity  of  the  sources  in  the 
sound  mixture;  Aims  1  and  2).  Results  also  show  that  in  complex  environments,  spatial 
cues  play  multiple  roles  in  helping  listeners  operate  in  complex  environments  (Aims  1,  2, 
3,  and  5).  Spatial  cues  help  us  to  separate  the  acoustic  energy  in  the  environment  into 
discrete  objects  or  streams,  and  attend  to  an  object  of  interest  from  a  particular  location. 
Comparisons  between  attention  to  source  location,  source  timbre,  and  source  level  show 
that  any  one  of  these  features  can  be  used  to  select  out  a  source  from  a  mixture,  and  that 
different  listeners  rely  on  these  different  cues  to  different  degrees.  We  showed  that  spatial 
cues  also  influence  the  formation  of  auditory  objects,  particularly  when  the  objects  are 
grouped  across  time.  Moreover,  it  appears  that  the  way  in  which  an  ambiguous  sound 
mixture  is  parsed  into  objects  depends  upon  what  object  is  attended.  Our  analysis 
demonstrates  that  past  models  cannot  account  for  the  improvements  that  we  observe  in 
complex  settings,  when  attention  (rather  than  audibility)  is  the  main  limitation  on 
performance  (Aim  4). 

Selective  attention 

We  found  that  listeners  attending  to  one  message  in  the  presence  of  similar  messages 
benefit  from  perceived  spatial  separation  between  the  target  source  and  its  competitors. 
Moreover,  once  the  relative  levels  of  the  target  and  masker(s)  at  the  ears  are  taken  into 
account,  the  type  of  spatial  cues  giving  rise  to  differences  in  perceived  location  has  no 
effect  on  this  spatial  gain.  In  particular,  the  gain  is  identical  when  the  separated  sources 
contain  “realistic”  cues  (containing  all  of  the  natural  spatial  cues  that  arise  in  the  real 
world),  only  interaural  phase  differences  (a  strong  left-right  direction  cue),  or  only 
differences  in  interaural  level  and  interaural  envelope  timing  (weak  cues  for  left-right 
direction).  This  work,  published  in  Acustica  united  with  Acta  Acustica  in  2005,  has 
important  implications  for  display  of  auditory  information  to  a  human  operator,  as  it 
suggests  that  in  some  environments,  even  extremely  simple  spatial  cues  (e.g.,  simply 
delaying  the  sound  at  one  ear  relative  to  the  other)  can  provide  all  of  the  possible  benefits 
of  doing  more  complicated,  expensive  real-time  processing  to  render  auditory  events  at 
different  locations. 

Using  a  new  audiovisual  paradigm  we  showed  that  simple  visual  cues  are  effective  at 
guiding  attention  to  auditory  objects  in  complex  multi-source  environments  (in  press  in 
Journal  of  the  Association  for  Research  in  Otolaryngology).  Lights  indicating  where  to 
listen  for  a  target  embedded  in  similar  masking  sources  from  different  directions  give 
consistent  improvements  in  performance.  The  results  demonstrate  that  knowing  where  to 
listen  is  extremely  useful  for  guiding  attention  in  a  complex  acoustic  scene.  On  the  other 


hand,  cues  that  indicate  when  to  listen  are  only  helpful  if  the  target  source  is  extremely 
difficult  to  hear  out  of  the  mixture  (e.g.,  when  the  target  talker  is  a  familiar  bird  song  that 
is  embedded  in  other,  unfamiliar  birdsongs).  For  target  signals  that  are  relatively  salient 
(such  as  speech  targets  embedded  in  reversed  speech  maskers),  cues  for  when  to  listen  do 
not  aid  performance.  This  demonstrates  that  visual  cues  that  are  temporally  correlated 
with  a  target  can  increase  listener  vigilance  and  enhance  the  segregation  of  a  target  from 
a  confusing  mixture  of  sources. 

Further  experiments  investigating  selective  attention  to  speech  demonstrate  that  listeners 
can  use  non-spatial  features  such  as  timbre  to  focus  attention  on  a  desired  talker.  While 
overall  performance  does  not  improve  when  sources  are  spatially  separated  and  listeners 
are  attending  to  timbre  (rather  than  location),  the  pattern  of  errors  depends  on  spatial 
separation.  When  sources  are  close  together  and  listeners  are  attending  to  timbre,  they  are 
more  likely  to  confuse  the  two  talkers,  and  report  a  mixture  of  key  words  from  both 
talkers.  When  sources  are  separated  and  listeners  are  attending  to  timbre,  the  overall 
number  of  correct  responses  is  the  same,  but  listeners  rarely  report  mixtures  of  the 
talkers;  instead,  listeners  are  more  likely  to  report  all  keywords  from  the  wrong  message. 
This  result  shows  that  spatial  separation  helps  listeners  properly  separate  sources 
perceptually,  even  if  space  is  not  being  used  to  guide  attention.  This  result  gives  very 
strong  evidence  that  in  a  complex,  confusing  situation,  spatial  auditory  displays  can 
provide  critical  information  aiding  a  listener  in  organizing  the  aeoustic  scene,  even  if  they 
do  not  know  the  location  of  the  source  that  is  most  important  at  a  given  moment. 

Divided  listening 

In  a  divided  listening  paradigm,  we  examined  the  effect  of  spatial  separation  on  the 
ability  of  listeners  to  report  both  messages  in  a  simultaneous  pair  of  speech  sources. 
Results  demonstrate  that  spatial  separation  improves  divided  listening,  primarily  by 
enhancing  intelligibility  of  the  less  intense  talker.  In  another,  similar  study  we  directly 
assess  the  “cost”  of  dividing  attention  between  sources  of  equal  intensity  (published  in 
the  Journal  of  the  Acoustical  Society  of  America  in  2006).  Competing  messages  were 
presented  either  with  a  small,  moderate,  or  large  spatial  separation.  In  separate  blocks,  we 
measured  the  ability  of  subjects  to  report  the  message  from  one  source  or  from  both 
sources.  We  assessed  the  “cost”  of  dividing  attention  by  directly  comparing  dual-task 
performance  to  the  single-task  performance.  Results  show  that  there  is  a  small  inerease  in 
the  dual-task  cost  as  the  spatial  separation  between  the  messages  increases.  This  finding 
is  consistent  with  a  “spatial  spotlight”  model  in  which  listeners  are  worse  on  a  divided 
task  when  the  sources  are  far  apart. 

Detailed  analyses  of  these  two  divided  listening  experiments  suggest  that  listeners  cannot 
listen  to  more  than  one  source  at  a  time.  Instead,  in  order  to  report  the  messages  from 
both  of  two  simultaneous  talkers,  listeners  appear  to  select  one  source  to  attend  actively, 
and  then  recall  a  sensory  traee  of  the  stimulus  from  memory  in  order  to  report  the  “lower 
priority”  talker.  Moreover,  prior  work  suggests  that  such  a  memory  trace  degrades  rapidly 
with  time,  so  that  the  lower-priority  message  will  not  be  able  to  be  reealled  with  any 
accuraey  unless  listeners  can  recall  this  memory  trace  soon  after  the  message  ends.  In 


complex  listening  situations,  a  listener  will  be  able  to  extract  the  meaning  of  the  source 
that  they  actively  attend  during  the  stimulus  presentation.  However,  they  will  only  be 
able  to  report  the  lower-priority  target  message  if  it  the  message  they  are  actively 
attending  is  short  -  the  longer  the  messages  involved,  the  less  able  they  will  be  to  recall 
the  lower-priority  message.  In  divided  listening,  the  strategy  adopted  by  listeners  depends 
on  their  expectation.  If  one  of  the  talkers  is  expected  to  be  low  in  intensity  compared  to 
the  other  talker,  listeners  pay  attention  to  the  talker  that  they  expect  to  be  harder  to  hear. 
If  listeners  are  presented  with  two  talkers  that  are  equally  intense  and  are  instructed 
which  talker  to  report  first,  they  appear  to  actively  attend  to  the  source  that  they  must 
report  first.  In  particular,  performance  is  significantly  lower  for  the  “lower  priority” 
source  that  they  report  second.  This  result  demonstrates  that  listeners  automatically  adopt 
strategies  that  are  near  optimal  for  the  expected  listening  situation.  This  conclusion  is 
very  important,  as  it  suggests  that  it  is  very  important  for  listeners  to  be  given  all 
available  information  about  the  listening  situation.  Such  knowledge  will  be  used 
automatically  by  the  listener  to  determine  how  to  allocate  attention  optimally,  based  on 
the  situation. 

Sound  source  segregation 

We  have  conducted  a  series  of  experiments  exploring  the  degree  to  which  spatial  cues 
directly  influence  how  listeners  interpret  a  mixture  of  acoustic  energy  (distributed  over 
time  and  frequency)  and  form  auditory  objects.  We  created  sound  mixtures  in  which 
subjects  would  hear  two  distinct  objects:  a  repeating  tone  and,  intermingled  in  time,  a 
harmonic  complex.  The  two  objects  were  constructed  such  that  there  was  a  target  tone 
that  logically  could  belong  to  either  of  the  two  streams  (i.e.,  its  frequency  matched  the 
repeating  tone  and  was  rhythmically  consistent  with  a  tone  in  that  repeating  pattern; 
however,  it  also  was  harmonically  related  to  the  complex  and  was  turned  on  and  off 
simultaneously  with  the  complex).  We  then  investigated  how  changing  the  spatial 
properties  of  the  elements  influenced  the  perceived  streams.  We  found  that  spatial  cues 
had  a  very  large  influence  on  the  degree  to  which  the  target  fell  within  the  repeating  tone 
sequence,  but  only  a  modest  influence  on  the  degree  to  which  the  target  was  heard  as  part 
of  the  harmonic  complex.  Even  more  importantly,  we  found  that  these  independent 
measures  had  little  to  do  with  one  another:  that  is,  the  degree  to  which  the  target  was 
heard  as  part  of  the  tone  sequence  had  little  power  in  predicting  the  degree  to  which  it 
was  heard  in  the  harmonic  complex.  These  results  are  consistent  with  recent  views  of 
segregation  and  streaming,  which  implicate  attention  in  the  process.  We  believe  that  the 
way  in  which  the  acoustic  mixture  is  parsed  depends  on  which  object  the  listener  attends. 
Thus,  we  find  that  spatial  cues  have  a  very  strong  role  in  how  objects  are  formed  across 
time,  but  a  weak  role  in  how  objects  are  grouped  across  frequency.  This  result  has 
important  implications  for  how  spatial  cues  may  be  used  to  reduce  interference  between 
competing  sound  messages:  spatial  cues  can  be  used  to  attend  to  a  stream  of  information 
over  time,  but  cannot,  in  isolation,  help  a  listener  hear  out  one  sound  element  from  a 
mixture  of  sound  that  is  otherwise  heard  as  a  single  object. 

We  have  also  measured  how  ambiguous  sound  elements  contribute  to  the  perceived 
locations  of  competing  objects.  These  results  show  a  dissociation  between  how  sound 


elements  contribute  to  perceived  object  content  or  identity,  versus  how  they  contribute  to 
perceived  object  location.  In  particular,  we  find  that  sound  elements  may  not  be 
perceived  as  part  of  a  particular  object,  yet  can  still  contribute  to  that  object’s  perceived 
location. 

IMPACT/APPLICATIONS 

In  most  modern  command  and  control  environments,  highly  trained  human  operators  are 
expected  to  deal  with  extremely  complex  scenarios  and  process  large  amounts  of 
information.  One  of  the  limiting  factors  in  such  settings  is  the  human’s  ability  to 
simultaneously  monitor,  prioritize,  and  react  to  competing  sources  of  information.  By 
investigating  how  stimulus  attributes  such  as  spatial  cues  influence  the  ability  of  the 
human  to  sort  out  and  separate  competing  sound  sources  and  focus  attention  on  a  source 
of  interest,  new  insights  into  the  perceptual  and  cognitive  capabilities  of  the  human 
operator  will  be  gained.  Such  knowledge  is  critical  for  designing  displays  that  allow  a 
human  operator  to  cope  with  multiple,  competing  sources  of  information  in  natural, 
effective  ways. 
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